MULTIPLE RECURRENCE AND CONVERGENCE 
^ ; ALONG THE PRIMES 
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O ■ Abstract. Let _E c Z be a set of positive upper density. Suppose that 

' Pi, P2, . ■ . , Pk €z are polynomials having zero constant terms. We show 

that the set En{E - Pi{p~l))n . . .n{E - Pk{p-1)) is non-empty for some 
prime number p. Furthermore, we prove convergence in of polynomial 
multiple averages along the primes. 
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1. Introduction 



00 

Q 

^ I Given a subset E of the integers having positive upper density, the set E — E 

of differences between pairs of elements of E contains an element of the shape 
psj . P — 1, with p a prime number. This conjecture of Erdos was proved by means 

\ of the Hardy-Littlewood (circle) method by Sarkozy [IBj in a quantitative form 

' which shows that, if E — E contains no shifted prime p — 1, then necessarily 

x-^card(i^n [l,x]) « aoslogloS^)^aoglogloglog^) . (1.1) 

(log log x)^ 

§ ' Subsequent improvements, first by Lucier pLBj, and most recently by Ruzsa and 

Sanders [17], show that the function on the right hand side in the conclusion 
(11. ip may be replaced by exp(— c(logx)^/'^), for some positive absolute con- 
I stant c. Problems in which one asks for specified constellations of differences 

' between successive terms from a sequence of elements in E, each difference 

depending on the same shifted prime, have been addressed only very recently. 
Thus, for example, the problem of exhibiting non-trivial three term arithmetic 
progressions from E, with common difference a shifted prime, was successfully 
analysed by Frantzikinakis, Host and Kra [5], with the analogous problem for 
longer arithmetic progressions conditional on the Inverse Conjecture for Cow- 
ers Norms formulated by Green and Tao [S]. Our goal in this paper is the 
unconditional resolution of a generalisation of these earlier results, an ana- 
logue of the Bergelson-Leibman theorem [2], which exhibits a constellation of 
differences defined by given polynomials whenever these polynomials have zero 
constant terms. 

In order to describe our conclusions, we must introduce some notation, and 
this we use throughout. We denote by [A^] the discrete interval {1, . . . , A^} of 
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natural numbers. Also, we write \X\ for the cardinality of a finite set X, and 
when X is non-empty, we write 




Given a set of integers E having positive upper density, and polynomials 
Pi, . . . , Pfc G Z[x], we define the return set Rp^,...^p^ by 

i?p^ = {neZ: En{E - Pi{n)) n...n{E- Pkin)) ^ 0}. 

Finally, we write P for the set of prime numbers. It is natural to conjecture that 
return sets defined by polynomials with zero constant terms contain shifted 
primes (see, for example. Conjecture 1.1 of [H]). Our first result confirms this 
conjecture in full generality for the sets P ± 1 of shifted primes. 

Theorem 1.1. Let E be a set of integers having positive upper density, and 
let Pi, . . . ,Pk G Z[a;] satisfy the condition that Pi{0) = {1 ^ i ^ k). Then 
Rpr,...,p, n (P + 1) ^ and Rp,,...,P, n (P - 1) 7^ 0. 

We are also able to establish that polynomial averages converge when re- 
stricted to the prime numbers. 

Theorem 1.2. Suppose that X = {Xo,B, fi,T) is an invertible measure pre- 
serving system. Let fi, . . . , fk G L°°{X), and let Pi, . . . ,Pk G Z[x]. Then as 
N ^ oo, the averages 

k 

E T\f,{TP^^'^x) 

converge in L^(X). 

The simplest case of Theorem 11.11 is that in which k = 1 and Pi{n) = n. 
As we have already noted in our opening paragraph, this is the case that was 
successfully considered by Sarkozy \TE\ via the circle method. The conver- 
gence of the averages asserted by Theorem 11.21 in this case was apparently first 
demonstrated by Weirdl [21j, and pointwise convergence has also been estab- 
lished (see [1], [So])- In the special case k = 2 and {Pi{n) , P2{n)) = {n,2n), 
the conclusions of Theorems 11.11 and 11.21 have been proved unconditionally 
by Frantzikinakis, Host and Kra [5.j, and subject to the truth of the Inverse 
Conjecture for Cowers Norms described in |8j, this work extends also to any 
positive integer k and linear polynomials Pi{n) = m (1 ^ z ^ We note, 
however, that the full conclusions of Theorems 11.11 and 11.21 do not follow from 
the approach in [5J, even if one is prepared to assume the latter Inverse Con- 
jecture. We remark also that Li and Pan [H] have very recently established 
the case = 1 of Theorem 11.11 when the set of shifted primes is P — 1 (see 
Corollary 1.1 of [H]). 

Rather than attempt to wield control of the prime variable conjecturally 
made available through Cowers norms, we instead seek control of convergence 



The Inverse Conjecture for the Gowers norm in the case /c = 4 is by now known |10] . 
and thus the proof in [5j extends unconditionally to 4-term arithmetic progressions. 
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through a variable from the set E, switching the roles of this variable and the 
prime. Such a strategy, in which for less well controlled aspects of an analysis 
one may crudely count prime variables by inclusion in larger well-behaved sub- 
sets of the integers, is reasonably familiar to practitioners of the circle method 
and sieve theory. The mechanism which makes this switching of roles effective 
is the use of the local Gowers norms introduced in [19\. This allows us to 
assume that the set E possesses extra structure, namely a nilstructure. With 
this information in hand, we are able to apply the recent work of Green and 
Tao [9], showing that the Mobius function is orthogonal to (polynomial) nilse- 
quences, in combination with the Leibman structure theorem for multivariable 
polynomial averages [11] in order to deliver the conclusions of Theorems 11.11 
andO 

It seems likely that our methods could be adapted to handle modifications of 
Theorems 11.11 and 1 1 . 21 in which the polynomials in Z[a;] are replaced by general 
integer-valued polynomials. Indeed, even the restriction to polynomials having 
vanishing constant terms might be weakened through a modification of the 
hypotheses of Theorem 11.11 to accommodate jointly intersective polynomials 
(see [3] for the relevant ideas). 

We have recorded a number of notational and technical preliminaries relating 
to the ergodic theory that we employ in two appendices at the end of this paper. 
Readers not already aficionados of the subject area would be well-advised to 
peruse this material before continuing further. In particular, we take this 
opportunity to emphasise that throughout this paper, whenever we refer to a 
measure preserving system, we implicitly assume this system to be invertible. 
In section 2 we outline our approach to the central problem of the paper. We 
consider the prime return set -Rpi,...,Pj. in section 3, providing the details of the 
proof of Theorem 11.11 Section 4 is devoted to the convergence of polynomial 
averages restricted to the primes, leading to the proof of Theorem 11.21 

The authors are grateful to the referees of this paper for their careful reading, 
detailed comments, and the consequent improvement in our exposition. 

2. Outline of proof 

We begin by considering a /c-tuple of polynomials P = (Pi, . . . , Pk), and a 
set E having upper density exceeding some positive number 6. Our first step 
is to translate the question on the prime return set into an ergodic theoretic 
one via Furstenberg's correspondence principle. Thus we replace the set E by 
a measurable set A, of measure yu(A) > 5, in a probability measure preserving 
system X = {X, B, fi,T). By a uniform version of the Bergelson-Leibman 
theorem (see Theorem 13.91 below), there is a positive number c{6) with the 
property that for any natural number W, one has 

lim E /i(T-^i(^")v4 n . . . n T-^''^^")^) > c{S). 

Here, we emphasise that the number c{6) depends on 6, as well as the polyno- 
mials Pi, . . . ,Pk, but is independent of A and W. 
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The ordered polynomial system V = {Pi, . . . , Pk} determines, via PET in- 
duction, the number of steps l(V) that one must take, by repeated application 
of the Cauchy-Schwarz inequahty, to obtain a parallelepiped system of polyno- 
mials independent of the parameter n. This in turn determines the sieve level 
R = N'^, also independent of W, via the condition t] < 2~^~'-^'^\ All estimates 
henceforth depend implicitly on 6 and 77. 

In our next step, we take w to be a slowly growing function of N, and put 

W=l[p. 

p<w 

peP 

We will sometimes need to fix w (very large) and take much larger, and for 
this reason it is useful to adopt the following convention concerning Landau's 
o-notation within this paper. As usual, when a quantity approaches zero as 
the main parameter A^ approaches infinity, we shall say that this quantity is 
0(1). We denote by o^„^oo(l) any quantity that approaches zero as — > 00. 
Finally, we denote by o^(l) any quantity that, with w fixed, approaches zero 
as A^ — )■ 00. 

Next, let b be an integer with (6, W) = 1. Perhaps it is worth noting that, 
when it comes to establishing Theorem 11.21 in section 4, we must consider 
all possible values of b. However, for the proof of Theorem 11.11 in section 3, 
it transpires that the only values of b of interest are ±1 (see the discussion 
surrounding (13. 4p below). We define the function Ay^^b{n) by putting 

when Wn + b is a prime numbeijl, and otherwise by putting A^, ^(n) = 0. Here, 
as usual, we write (p{W) for the Euler totient, so that ipiW) = Y[p<w(P ~ -'-)• 
In [19], an enveloping sieve argument is applied to show that there exists a 
function z/^,fe(n) with the property that A^^bin) ^ ^'iu,6(n), so that A^^;, is 
pointwise bounded by i/^^f,, and 

\Ww,b ~ l||vp = 0^-^00 (!)• 

Although we defer until later the definition of the norm here, it may be helpful 
to note that it is similar to a Gowers norm, though with shift sizes short 
with respect to A^, but larger than the sieve level R. We remark that our 
use of notation differs from that in [19], owing to the simpler nature of the 
polynomials in question, as well as the absence of scaling issues which obviates 
the need for the full structure theorem proved in jTS] . 
We examine the average 

E A„,;,(n)/i(T-fi(^"M n . . . n T-f'=(^"M), (2.1) 



^In the detailed account of our argument in section 3, we make the additional technical 
restriction that A^^tin) is thus defined only when n £ [■^N]. The straightforward complica- 
tions associated with this constraint are best ignored in the present outline. 
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and make use of the majorant u^^b of A^^j, to compare it to the related average 
E r7/i(T-^i(^"M n . . . n T-^^^^'^^A), (2.2) 



which we aheady know to exceed ?7c(5). Our aim is to show that the difference 
between these averages is o^_s.oo(l)- This we achieve in two steps. The parame- 
ter l(V) determines a factor Z;(-p)(X) having the structure of an {l(V) — l)-step 
nilsystem, this system being independent of W. In the first step, we show that 
when fi is orthogonal to Z;(p)(X) for some index i with 1 ^ i ^ k, or equiva- 
lently, when vr : X — )• Zj(p)(X) is the factor map and vr^/j = 0, then 



J fj^ 



UI— >oo ( f J 



As usual, here and throughout, we write Tf{x) for f(Tx). We then decompose 
the characteristic function on A by means of the trivial relation 1^ = 7r*7r*lyi + 
{1a — 7r*7r^,l^). Then 7i^{1a — 71*71^:1 a) = 0, and thus 

E A^,fe(n)/i(T-^i(^"M n . . . n T-^^^'^^M) 



E 



i=l 



This allows us to reduce to the situation in which the system X is an Z(P)-step 
pro- nilsystem. In fact, technically speaking, we replace A by the non- negative 
function tt^,!^, which has integral against 7r^,/i exceeding 5. We note that the 
universality of the constant c{5) applies for any such function. 

We make an additional reduction to the case in which / is defined on a 
nilsystem [G/T, B, fi,T). This is achieved by means of an approximation in 
L^, and is independent of w. If this system is disconnected, then it can be 
decomposed into a union of some finite number, J, of components {Xj}^^ 
having the property that T-^ : Xj — )■ Xj is totally ergodic for 1 ^ i ^ J. 

We now follow the argument of |8|. We replace Aw h{n) by the function 

in which A denotes the classical von Mangoldt function. We then decompose 
A by means of a Mobius identity into the shape A^ + A**, corresponding to 
an associated smooth decomposition of the identity function x{^) = a; in the 
shape X = X** + X^ with A^ associated to small divisors and A^ associated to 
large divisors, just as in [8]. Observe next that for any Lipschitz function /, 
the expression 

k 

n(T^^(^«)/(a:)) 
i=i 
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is a polynomial nilsequence on (G/F)'^. As in [8J, we show that the contribution 
arising from the term 

k 

E i^AKWn + 6) - l) n(T^^(^")/(x)) 

is negligible. The estimate of the contribution arising from the term corre- 
sponding to follows from Theorem 1.1 of which asserts that the Mobius 
function is orthogonal to polynomial nilsequences with bounds that depend 
only on the degree of the polynomial and not on the polynomial itself. 

Our goal of showing that the averages (12.11) and fl2.2p are asymptotically 
equal is completed by combining the results of the last paragraph, and this 
completes our outline of the proof. 



3. Prime return sets 



Our objective in this section is the proof of Theorem II. 1[ We begin with 
a discussion of the pseudorandom measures employed in the sketch of the 
argument provided in the previous section. 

3.1. Pseudorandom measures. We first define a normalised counting func- 
tion for prime numbers, with a smoothing weight designed to flatten distri- 
bution across a subset of residue classes. Let rj he a positive number with 
< 2-3-'(^), and put R = N"^. Deflne the function T : [A^] R+ by putting 
l(a;) = 1 when x G [\N], and otherwise by taking l(a;) = 0. In addition, deflne 
A^,;, : [A^] M+ by setting 

KAx) = ^\ogR, (3.1) 

when X G [\N] and Wx + 6 G P, and otherwise by taking A^^f,(x) = 0. Here, 
we choose to identify [^A^] with a subset of Z/NZ, in the usual manner. We 

remark that the function A^ ^(x) is a modiflcation of the classical von Mangoldt 
function A(x). The use of logi? in place of logA^, as a normalising factor, is 
necessary in order to bound A pointwise by the pseudorandom measure u 
shortly to be deflned. The ratio t] between log R and log A^ reflects the relative 
density between the primes, and the almost primes occurring implicitly within 
our argument. 

An application of the Prime Number Theorem in arithmetic progressions 
with error term (see, for example. Corollary 11.21 of [16]) reveals that when b 
and W are coprime, one has 

It follows that Aw^h has relatively large mean, namely 

E A^,6>r/. 

n£[N] 



MULTIPLE RECURRENCE 7 

Before announcing the key properties of the pseudorandom measure em- 
ployed in our argument, we must record some definitions. The first definition, 
of a measure, comes from Definition 6.1 of [8]. 

Definition 3.1. A measure is a non-negative function : [A^] — j- with 
the total mass estimate 

E = l + o^^oo(l), (3.2) 

ne[N] 

and such that for each positive number e, one has the crude pointwise bound 

Next we define polynomial norms analogous to Gowers norms. 

Definition 3.2. Let a be a function from Z into C supported in [N]. When k 
is a non- negative integer, we define the Vk-norm of a to be the quantity \\a\\vk 
defined via the relation 



1 2*= 



E E n a"(ri + w-m+(l-cj)-m') 

^ AT rrrr: J- 



Here, we write 1 for the vector (1,1,...,!), and we put a'^ = a when Yli=i = 
(mod 2), and otherwise we put a'^ = a. Also, when V = {Pi, . . . , Pk} is a 
standard polynomial system with parallelepiped order 1{V), we define the Vp- 
norm of the function a by \\a\\vj, = ||a||y,jpj^^. 

Observe that 

llallvi = E E a(n + m) ^ 0, 

so that the definition of the V^-norm makes sense when k = 1. For larger 
values of k, such follows from the following lemma, which records two simple 
properties of the V^-norm useful in our subsequent deliberations. 

Lemma 3.3. Let a be a function from Z into C supported in [N]. When k is 
a non-negative integer and < j ^2^, one has 

E \\a{n + m)a{n + m)fy^ ^ \\af^^^^, 

m,m' ^\/~N 

with equality when 7 = 2^^. //, moreover, the function a has the property that 
for each positive number e, one has the pointwise bound a{n) = 0^{N'^), then 



E a{n) 



^ \\a\\v^ + 0(1). 



Proof. The first claim follows at once from the definition of the V^-norm, since 
by Holder's inequality one has 

E \\a{n + m)a{n + m')\\\^( E \\a{n + m)a{n + m')\\'^yY^ , 



Tn,m' 
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and the expectation within parentheses on the right hand side here is equal to 
EE n a"('^ + '^-m+(l-c^)-m') = ||a||gt!- 



mJ,,...,m'j.s£//V ' 



The final conclusion of the lemma is essentially a consequence of the van der 
Corput lemma, as in the proof of Lemma A.l of [12], though here we are more 
precise and do not restrict to real functions. Observe that, as a consequence 
of our hypotheses concerning a(n), one has 

Ea{n)= E E a{n + m) + 0{N'-'/^). 

Interchanging the order of summation, an application of Cauchy's inequality 
yields 



E a{n) 



\a\\l +o(l] 



\Vi 

The desired conclusion is now immediate. □ 

The following theorem is essentially equivalent to Theorem 3.18 of [19], and 
demonstrates the existence of a pseudorandom majorantH. 

Theorem 3.4. Let V be a standard polynomial system, and let rj = 2^^^'(^\ 
Then there exists a measure u^^b with the property that the function {, defined 
in (\3.1^ enjoys the pointwise bound ^ A^_;, ^ z/^^^, and further 

~ 1 llvp = 0^-s>oo(l)- (3.3) 

We note that in [19], the parameter w is concretely fixed to be of order 
log log log A^. In present circumstances, meanwhile, we prefer to think of w as 
very (very) large, but constant, since in the ergodic convergence results we do 
not have uniformity in w. To clarify the dependence on w, we use both the 
notations o^(l) and 0^^,^00(1)7 as defined in section 2. 

3.2. Translation to the ergodic world. We open the main thrust of our 
argument by translating the basic question to an ergodic theoretic setting. We 
achieve this goal by means of the Furstenberg Correspondence Principle (see, 
for example, Furstenberg [B]). 

Lemma 3.5. Let E be a set of positive upper density in Z. Then there exists 
a measure preserving system X = {X, B, fi,T) , and an element A of B with 
fi{A) > 0, with the property that when 

/i(A n T-"M n . . . n t-"M) > o, 

then 

E n (E - ni) n . . . n {E - rik) ^ fj). 



^ In modern language, the measure whose existence is asserted by Theorem l3.4l is described 
as a pseudorandom measure, by virtue of the property p.3p 
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Making use of ergodic decomposition, it follows as a corollary of this conclu- 
sion that in order to prove Theorem 11.11 it suffices to establish the following 
ergodic theoretic version of this theorem. 

Theorem 3.6. Suppose that X = (X, /i, T) is an ergodic measure preserving 
system, and let Pi, . . . , G satisfy the condition Pi{0) = {1 ^ i ^ k). 
In addition, suppose that A E B satisfies the condition fi{A) > 0. Let 

Sp,,...^P^ = {neZ: fi{An T-^i("M n . . . H T-^'^^")^) > 0}. 

Then 

5p„...,P,n(P+ 1)7^0 and 5p„...,p, n (P - 1) ^ 0. 

As in many other recurrence results, it is easier to show that the set 5'pj^...^Pj.n 
(P ± 1) is large than merely showing that it is not empty. In particular, it 
suffices to show that for any integer b with (6, W) = 1, one has 

E /x(Anr-^i(^")An...nT-^'=(^")A) >o. (3.4) 

Wn+beF 

Notice here that we have no useful control over W. However, since (±1, W) = 
1, it follows from the Siegel-Walfisz theorem (see, for example. Corollary 11.21 
of [16]) that for large enough values of N and 6 = ±1, the expectation in (13. 4p 
is taken over a non-empty set. Hence, the lower bound (13. 4p is sufficient to 
establish Theorem 13.61 On the other hand, the set P — 2 is not a return set for 
polynomial averages. 

The next lemma is classical. 

Lemma 3.7. Suppose that |a„| < 1 for each integer n. Then one has 
E awn+b- E -^;r-^A{Wn + b)awn+b 

Wn+b& 

As a consequence of this result, one may replace the average on the left 
hand side of (13. 4p by a weighted average, wherein the weights are given by a 
modified von Mangoldt function. This conclusion we summarise in the next 
lemma. 

Lemma 3.8. Suppose that fi{A) > 6, for some positive number 6. Then, in 
order to establish the lower bound l \3.4\) , it suffices to show that 

E K.b{n)fi{A n r-^^(^")A n . . . n r-^'=(^")A) »5 1 + o^(i) + o^^U^). 

Equivalently, writing 1a{x) for the characteristic function of the set A, it suf- 
fices to show that 

E / A^,b{n)lA{x) TT(T^^('^")U(x)) d/i >5 1 + 0^(1) + o^^oo(l). (3.5) 

n<:N J -^-^ 

In order confirm (13.50 . we require two additional results. The first treats an 
analogous situation in which the von Mangoldt weights are absent, a quanti- 
tative version of the Polynomial Szemeredi theorem. 
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Theorem 3.9. With the notation and assumptions of the previous section, 
suppose that 6 > 0, and let g : X ^ W be any function obeying the pointwise 
bound ^ g ^ 1 + o(l), together with the mean bound J^gd^ ^ 5 — o(l). 
Then we have 

E TH / g{x)\[{T''^^'^-^g{x))d^^^c{5)-oM. 

where c{6) is a positive number depending on 6 and Pi, . . . , Pk, but independent 
ofW. 

Proof. This foUows from Theorem 3.2 of [19j. □ 

We also require the following structure theorem, due to Leibman [TT], iden- 
tifying nilsystems as characteristic factors for multivariate polynomial multiple 
averages. 

Theorem 3.10. Suppose thatX = {X,B, fi,T) is an ergodic measure preserv- 
ing system. Let Qi, . . . ,Qs G Z[xi, . . . , Xm] be polynomials. In addition, let Q 
denote {Qi, . . . ,Qs}. Then there exists a factor Y = (Y,V,h',S) of X, with 
TT : X — )■ Y as the factor map, and an integer d{Q), such that: 

(i) the system Y has the structure of an inverse limit of d{Q)-step nilsys- 
tems, and 

(ii) the average difference 

j=l 1=1 

is Ow{l) in L^{X). Here, we have written M. for [Mi] x . . . x [M^], 
and the convergence is as Mi, . . . , Mm — )■ oo. 

Note that the rate of convergence in this theorem may depend on w. What 
is crucial is that the integer d{Q) is independent of w. 

We at last come to the result of this section which does the heavy lifting in 
our argument. This provides a conclusion on orthogonality to nilsystems. 

Proposition 3.11. Suppose that X is an ergodic measure preserving system. 
Let fi,---,fk G L°°{X) be functions satisfying the condition ||/j||oo ^ L ^ 
j ^ k). Then there exists a factor Y of X, with tt : X — Y as the factor map, 
and an integer d{V), such that 

(i) the system Y has the structure of an inverse limit of d{V)-step nilsys- 
tems, and 

(ii) if, for some index i, one has ix^^fi = 0, then 



.7 = 1 



ol.«,(1) + ol 
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r 



/ E A^,,(n)TTT^^(^")/,(x)f//i 

J n^N -^-^ 



Observe first that, by the invariance of the measure /i under the action of T, 
it follows that for each positive number M, one has 



r 



J n^N KM-^-- 



PjiWn)+Wl 



fj{x)dfi 



Consequently, by applying the Cauchy-Schwarz inequality in combination with 
the triangle inequality, one obtains 



r ^ 



E K,b{n) 



E 



Pj{Wn)+Wl 



KM 



dfi. 



By Theorem 13.41 the modified von Mangoldt function Au,,b(n) is pointwise 
bounded by the pseudorandom majorant z/^,6(n), and hence we may replace 
the former by the latter in the last upper bound for T. Proceeding first in this 
way, and then applying the Cauchy-Schwarz inequality once again, we deduce 
that 



nPj{Wn)+Wl 



^ /( E i^u^M E TTt 

^ /( E i^u>,b{n))(E T^n>,b{n) E TTt 



dfi 



PjiWn)+Wlj,^^^ 



dfi. 



Our goal in the remainder of the proof is to establish that the integral 

k 



u 



E v^A'fA 



E\{t 



Pj{Wn)+Wl 



KM 



fjix) 



dfi 



(3.6) 



satisfies 

= Ol,u,{1) + OL,w^ooi^). (3.7) 

Since equation (13.21) provides the estimate 

E l^wAN) = 1 + O^^oo(l) 

for the average of the measure Vw,b{n), it follows from our earlier estimate for 
T together with (13.61) and (13. 7p that 

r ^ (1 + {l))U = OL,wil) + Ol,w^oo{1), 

and this suffices to complete the proof of the theorem. 



/ 
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We now focus on fl3.6p . expanding the square in the integrand to obtain 

k 
j = l 

Consider the average 

/ E E l\T''^^'^''\f){x)T'^^"'-'^J^{x))dfi. 

Take M to be a real number with M = N'^^-^\ In addition, write 

Q2i-i{n,m,l) = Pi{n) and Q2i{n,m,l) = Pi{n) + m - I (l^i^k), 

and put Q = {Qi, . . . , Q2k}- Let Y be the factor supphed by Theorem 13.101 
associated with Q. Then if for some i one has 7r^,/j = 0, then from the latter 
theorem it follows that the above average is ol^^(1). 

In view of the above discussion, it suffices to show that for any continuous 
bounded functions gi, . . . ,gk with \\gi\\oo ^ L'^, one has 



/ E - l)T\T''^^'^''\g,{x)T'^^"^-%{x)) d/i = OL,M + OL, 

.3 = 1 



We establish the latter by applying PET induction to show that, whenever a 
is a function from Z into C supported in [A^], and satisfying a{n) = Oe(A^^) 
for every e > 0, then one has 

/ E a{n)go{x)T[T''^^'^^^g,{x)di2 «i ||aH|| ^ + o(l). (3.8) 

The procedure here is very similar to that applied in [T^], but unfortunately it 
does not fit precisely into the framework of the latter. We therefore repeat the 
process in the present context. The trick is to insert some additional averaging 
by means of a parameter M of order a/JV. An important observation, in this 
context, is that since the polynomials may be supposed distinct, with zero 
constant terms, then the system {Pi, . . . , P^} may be reordered in such a way 
that we obtain a standard system. 

We first establish the case in which P is a standard linear system. Thus 
we suppose that V = {Pi, . . . , P^} is a standard linear system, and prove by 
induction on k that 

k 

E ain)go{x)T\T''^^'^'^^g^{x)dfi ||aH + o(l). 
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For k = 1, we must estimate the absolute value of the integral 
We observe first that I* = Xi + 0^(1), where we have written 

n^N J m.<x/N 
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By the Cauchy-Schwarz inequality, one has 



(iy{n+m)) 



(i/i 



1/2 



By another application of the Cauchy-Schwarz inequality, we obtain the upper 
bound 



X^<L E 



E 



E a(n + m)a{n + m')T^i(^("+'"^)^i(s)T^^(^("+'"'))^i(x) d/i. 

m,m' ^-s/N 



Consequently, by the triangle inequality. 



Xf <L E 

m,m' 



E a(n + m)a(n + m') / g^{x)T^^^^^'^'-'^^^{x) dfi 



Thus, on applying Lemma 13.31 and making yet another application of the 
Cauchy-Schwarz inequality, we deduce that 

Xi'<L E {\\a{n + m)a{n + m')\\v,+o{l)f <t:\\afy^+o{l). 

This confirms the inductive hypothesis when k = 1. 

Suppose now that K > 1, and the inductive hypothesis holds for k < K. In 
this case we evaluate the expression 



K 



Ea{n)g,{x)\{T''^^'^^^g,{x)d^i. 



(3.9) 



As before, we first obtain the relation X^ = Xr- + ol(1), where 

K 

Next, following an application of the Cauchy-Schwarz inequality, we obtain 

K 



Xx <L E (/ E a{n + m)f[T 



p,(iy(n+™))^^.(a;) 



1/2 
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A further application of the Cauchy-Schwarz inequahty leads to the relation 



^K^L E [ E a{n + m)f[T 

E / E a(ri + m)a{n + m') 



^ 2 



K K 

X J]t^^(^("+-))(7^.(x) J]T^^(^("+'"'))^^.(x) rf/i. (3.10) 

i=i i=i 

Next, owing to the invariance of under the action of T, we see that 
X^<i E / E a{n + m)a{n + m') 

where 

= T(^^-^i)(^™)(7,(x)T^^(^'"')-^i(^"^)^^.(x) (1 ^ J ^ K). 

As a consequence of the inductive hypothesis, we therefore deduce by means 
of Lemma 13.31 that 



Xi<L E {\\a{n + m)a{n + m')\\v^ + o{l)) <^\\a{n)fy^^^^+o{l). 

m,m' 

This confirms the inductive hypothesis, for standard linear systems, when 
k = K. The inductive hypothesis consequently holds for all standard linear 
systems. 

We now apply the PET induction scheme so as to reduce the general case 
to one in which the system V is standard and linear. We proceed by induction 
on the weight w{V) of the polynomial system V. Suppose that the desired 
conclusion holds for every standard polynomial system V with weight w{V) < 
w. Since we have already established the desired conclusion for every standard 
linear system, we may suppose that "P is a standard polynomial system of 
weight w{V) = w that is non-linear. As in the linear case, we begin by inserting 
some additional averaging over a variable m running over an interval of length 
\/N. Thus we evaluate the expression 

X; = / E a{n)go{x)r\T^^^'^-'>gj{x)dfi. 

J n^N j-J^ 
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The argument leading from (13.91) to (13.101) may now be applied, without mod- 
ification, to show that = Xp + ol(1), where 



E / E a{n + m)a{n + m') 



m,in' 

j=i i=i 



Next, applying the invariance of n under the action of T, we find that 

k 



pj2-P,WnW))-Pi(14^(n+H)^^.(3;)rf^. (3.11) 



k 

X 



When g is an integer, define the set of polynomials IZQ^q, q') by 
q) = {Pr{W{n + q')) - P^{W{n + g))}^.^^. 
Consider the set of polynomials IZ' = 7^g(m, m) U 7^o(m,m'), and let 

= {Ri, ■ ■ ■ 1 Rj} 

denote the system obtained from TZ' by removing the polynomials in TZ' of 
degree zero with respect to n. Then by Lemma IB. 11 the set of polynomials 
TZ has lower weight with respect to n than the set V, so that w(7Z) < w. 
Moreover, the estimate (13.111) takes the shape 

Iv-^lE [ho{x) E a{n + m)a{n + m')l[T''^^'^''^''''^h,{x)dfi, 

in which hj satisfies ||/ij||oo = 0^(1) (0 ^ j ^ J). Therefore, by the inductive 
hypothesis in combination with Lemma 13. 3[ we may conclude that 

Xp<L E {\\a{n + m)a{n + m')\\v^ + o{l)) <^\\a{n)fy^+o{l). 

m,m' 

This confirms the inductive hypothesis for all standard polynomial systems of 
weight w, and hence the inductive hypothesis (13. 8p has now been established 
for all polynomial systems. □ 

Corollary 3.12. Provided that the lower bound (3.5) holds in the special case 
wherein X is an inverse limit of nilsystems of bounded step, then it holds also 
without restriction. 

Proof. Let Y be the factor supplied by Proposition 13.111 and let vr : X — )■ Y be 
the associated projection. Decompose the characteristic function 1^ by means 
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of the identity 1a = 'n'*n^:lA + (1a — t!'*t!'*1a)- Then one sees that 



+ 0^(1) + 



□ 



We make one further reduction, from an inverse hmit of nilsystems to a 
nilsystem proper, and replace 1a with a Lipschitz continuous function. This 
we accomphsh by means of a standard approximation argument, obtaining a 
conclusion independent of w. 

Lemma 3.13. Suppose that X is an ergodic measure preserving system. Let 
fi, ■ ■ ■ , fk G L°°{X), and suppose that ||/j||oo ^ 1- If 9 ^ L°°(X) satisfies the 
condition — g\\2 < £, then 



J=2 



OJe). 



Proof. By applying the triangle inequality in combination with the Cauchy- 
Schwarz inequality, we obtain 



E A,,MT''^^'^''\fi-9){x)llT''^^'^''^f,{x)dii 



i=2 



^ E Am(^) 



i=2 



^ EAbAn)\\fi-9hYl\\fj\\oo<sil + oM)- 

j=2 



□ 



We are consequently able to conclude as follows. 
Corollary 3.14. Provided that the lower bound 

E [ K.b{n)g{x) TT(T^^(^")(7(x)) rf/i >5 1 + 0^(1) + o^^oo(l) 

holds in the special case wherein X is a nilsystem, and g is a Lipschitz con- 
tinuous function satisfying the hypotheses of Theorem \3.9[ then it holds also 
without restriction. 
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Now let X = {G/r,B, n, T) be an ergodic nilsystem, with the transformation 
T being given by a G G, and write for the identity component of G. If X is 
disconnected, then X° = G°r/r = G^/{T fl G°) is a connected component of 
X. Since X is compact, one finds that X is a disjoint union of finitely many 
translations of X°, say X = u/^]^a*X°, and the nilsystem Xq = (X°, a'^) has no 
finite factors. We may now assume further, without loss of generality, that the 
system Xq is of the form {L/A, b), where L is connected and simply connected 
(see the discussion at the beginning of section 1 of Leibman [I3]). In L there 
is an element c with = b. Since a* induces an isomorphism between Xq and 
Xj = (a'X", a"^), the same holds for Xj. 

For any Lipschitz continuous function /, and any fixed x E X, the sequence 
9x,wi'^)^ defined by 

g.An) = /(a^^(^")x) . . . /(a^^(^-)x), (3.12) 

is a polynomial nilsequence. Note that for a fixed integer i with 1 ^ i ^ J, 
the set 

{(q, ^ •'x, . . . , (3 '^^ ^X^}n=i (mod J) 

is contained in a fixed connected component of (G/F)'^. Furthermore, if tt, = 
mJ + z, then 

where P/ and g may depend on i, W, I, J. Thus /(a^'^^"^x) can be viewed as 
a polynomial nilsequence on the nilmanifold L/A. 
Now consider 

g.,u.M = In^i (mod j)/(a^^(^")x) . . . /(a^^(^")x). 

The function l„=j (mod j) is a 1-step nilsequence on the torus T = M/Z. It is 
defined by the polynomial (7 : Z — )■ M given by g{n) = n/J, and a function 
F : T — > [0, 1] which is Lipschitz, supported on a 1/(10 J) neighborhood of 
i/J, and for which F{i/J) = 1. Thus (72^.^,1 is a polynomial nilsequence on the 
product T and a connected component of (G/T)'', with new Lipschitz constant 
that may depend also on J. 

The upshot of the above discussion is that 

J 

1=1 

and thus gx,w{n) can be viewed as a polynomial nilsequence on a nilmanifold 
G/r, where the group G is connected and simply connected. 

Proposition 3.15. With the notation and assumptions in the preamble, one 
has 

E (^w,bin) - 7]l{n))g^An) = o^,||/||Lip(l)- (3-13) 

This proposition is essentially the polynomial version of Proposition 11.3 of 
[H]. We sketch a proof below. One of the main ingredients is the following 
lemma, which is the polynomial version of Proposition 11.2 of [8j. Since the 
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proof is essentially the same, we omit it, though we note that one could also 
prove this lemma using Proposition 11.2 of [8] and the fact that a polynomial 
nilsequence can be viewed as a linear nilsequence on some nilmanifold of larger 
nilpotence degree. This is shown in Leibman fTI] in the context of continuous 
nilsequences. All that would be required is to verify that Leibman's proof is 
valid for Lipschitz nilsequences, and is independent of W. 

Lemma 3.16. Let F{n) be a polynomial nilsequence in G/T defined by poly- 
nomials from the set V = {Pi,...,Pk}, as in ( \3.1^ . and suppose that F 
has Lipschitz constant M. Let e be a positive number, and suppose that 
N ^ 1. Then there exist integers r{V) and t{V), and a decomposition F{n) = 
Fi{n)-\-F2{n), where Fi is an averaged nilsequence on {G/Vy^'^^ with Lipschitz 
constant OM,e,G/r{^) , o^nd satisfying 

l|-^l('^)ll;7'-CP)[Ar]. <M,£,G/r 1, 

and where II-F2II00 = 0{e). 
We now replace Kw^bin) by 

KM)=v'^^^{Wn + h)l{n), 

where A is the classical von Mangoldt function. This is permissible for aver- 
aging purposes in view of the fact that the difference is negligible on average. 
To this end, we follow section 12 of [S]. Define the function x '■ '^'^ ~^ by 
putting x(a;) = x. We decompose x via the identity X = + X^ where x^ is 
a smooth function vanishing for |x| ^ 1, and x^ a smooth function vanishing 
for I a; I ^ |. This induces a decomposition A = A" + A**, with 



Att(n) = -logi?5^/i(rf)x» 

d\n 

and 

K\n) = -log rY,M^ 

d\n 



logfi 

\ogR 

\ogd 
\ogR 



Recall the definition of gx,w{f^)i and define F^ by means of the relation 
FxiWn) = gx,w{n). Let £ > be sufficiently small, and apply Lemma IB. 161 to 
decompose Fx{Wn) in the form Fx^i{Wn) + Fx^2{Wn), with conditions silently 
implied by the suffices 1 and 2. From here, following the argument of in 
order to accommodate the harmless additional factor l(n), one finds that 

E (^^AKWn + b)-l)l{n)Fx,,{Wn) 



^,£,ll/llLip(l) 



Wn + b) - 1 l(n) \\FxAWn)\\u.,r^^^^ 
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On the other hand, in view of the upper bound ||-Fx,2(W^'^)||oo < we see that 
E (^^A^iWn + b)- l)l{n)F,,2{Wn) 



< £ E 



Taking e now to be a positive function of decreasing to zero sufficiently 
slowly, it follows from the triangle inequality that 



E 



(^A»(H^n + 6) -l)lHF.(l^n) I =0^,11^11, Jl). (3.14) 



For the remaining part of the dissection, we apply Theorem 1.1 of [H]. The 
sequence 

hx,win) = g^^^iin - b)/W) 

is a polynomial nilsequence on the same group with the same Lip constant 
(with a polynomial sequence depending on W), and in addition is of the same 
degree. Moreover, one has 

E A-\Wn + b)gx,^{n)l{n)= E A^(n)/i,,^(n)T((n - 

n^N b<n^NW+b 

n=b (mod W) 

The average on the left hand side of (13.131) may therefore be successfully esti- 
mated by showing that 

logi? E E MxH^7^)h.A^d) = o^,y\i,.^^{l). (3.15) 

m^^NW+b d<:{^NW+b)/m MOg-K/ 
md=b (mod W) 

Fortunately, Theorem 1.1 of [9] implies a bound of the shape 

E fi{d)h,,U^d) « (1 + ||/||Lip)(log(M/iy))-^, 

d<:[M] 
md=b (mod W) 

valid for any positive number A and M ^ 2W . We note, in particular, that 
this bound is independent of the polynomial sequence (it depends only on the 
degree), and that there is no restriction on the size of the coefficients of the lat- 
ter polynomial. Since the weight x^(x) vanishes for |x| ^ |, it follows that the 
average over d in fl3.15p makes no contribution when [(iA^VT + 6)/m] < i?^/^. 
Consequently, since the weight x^(x) is smooth, we find by partial summation 
that when m G [^A^VT + &], the inner average on the left hand side of fl3.15p is 
equal to 

E Mx'C^)h.A^d) (1 + ||/||Lip)(logi?)-^. 

Ri/2<d^(\NW+b)/m \iOgn/ 
md=b (mod W) 

We therefore deduce that 

E E Mx'C^)KAmd) «^ (1 + ||/||Lip)(logi?)"^, 

m!S:^NW+b d^{^NW+b)/m MOg/t/ 
md=b (mod W) 
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and, provided that we take A > 1, this suffices to dehver the estimate claimed 
in dSlS]). 

The conclusion of Proposition 13 . 1 51 is obtained by combining the conclusions 
of fl3.14p and fl3.15p . From here, in view of Corollary I3.14[ the lower bound 
fl3.5p follows on noting that by Theorem 13. 9^ one has 

E l{n) [ U(x)TT(T^^(^")U(x))ti/i»5 1 + 0(1). 

J = l 

The lower bound f l3.4p now follows from Lemma 13. 8^ and this completes the 
proof of Theorem 11.11 

4. Convergence of multiple averages along the primes 

In this section we prove the convergence of polynomial multiple averages 
along the primes. Let fi,...,fk be bounded functions. Consider the averages 

k 

p prime 

We seek to show that the sequence {An{x)}'^^i forms a Cauchy sequence in 
L^. Observe ffist that the sequence {An{x)}'^^i is Cauchy if and only if the 
sequence {Bn{x)}'^^^ is Cauchy, where 

k 

Indeed, independently of the value of x, one has 

k k 



j,X] 

n<N " " p<N " 

-' p prime -' 

= o(l). 

It remains now only to show that the sequence {Bn{x)}'^^-^ is Cauchy, and 
this we achieve by applying a stronger version of Proposition 13 . 1 II that we now 
briefly pause to establish. This may be regarded as a result on orthogonality 
to nilsystems in L^. 

Proposition 4.1. Suppose that X is an ergodic measure preserving system. 
Let fi, . . . , fk G L°°(X), and suppose that ||/j||oo ^ L {1 ^ j ^ k). Then there 
exists a factor Y of\, with vr : X — )■ Y the factor map, and an integer d{P), 
with the following properties: 

(i) the factor Y has the structure of an inverse limit of d{P)-step nilsys- 
tems, and 

(ii) if for some index i one has n^^fi = 0, then, uniformly in b, one has 

k 

E A»,6(n) n T^^(^"+^)/,(a;) = 0^,^,(1) + 0^,^^0,(1). 
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Proof. In order to establish the proposition, it suffices to con&m that the 
expression 

k 



M 



^ AT 



X] 



satisfies the asymptotic relation 
But if we write 



9n{x) = YIt 



Pj{Wn+b) 



then an application of the triangle inequality to the expansion of M yields 



M= E A^,6(n)A^,fe(m) 

n,m<N 



^ E K,bin] 



(4.1) 



However, the hypotheses of the proposition imply that ||5'n(a;)||oo ^ L^, and so 
it follows from Proposition 13.111 that . uniformly in b, one has 



m^N J 



The desired conclusion now follows on substituting this estimate into fl4.ip . □ 

We now return to the proof of Theorem II. 2[ Suppose that fi, ■ ■ ■ , fk G 
L°°{X), and that \\fj\\ ^ L {1 ^ j ^ k). Let M be a large natural number, and 
put N = 2M. Observe that since the von Mangoldt function A is supported 
on prime powers, one has 



Bwm{x)= e ahTTt^^^")/,- 

n<WM , 



X 



^ y E^ 

n<M W 



0^b<W 
{b,W)=l 



{b,W)=l 
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By applying the triangle inequality in combination with Proposition I4.H we 
obtain 

k k 



n<N , n<N . , 



2 



,w (1) + Ol 



in which vr is the projection onto the relevant nilpotent factor supplied by 
Proposition 13.111 By Proposition 13.151 meanwhile, one has 

k 

E iv-'KAn) -TH) r[T^^(^"+^)7r*7rJ,(a;) = 0^,^(1). 

n<N -*■ 2 

Consider next the average Cb,Ai{x) defined by 

k 

a,Af(x)= E l[T^^'~'^''+'^f){x). 

It follows from Leibman [H] that the sequence {C'6_m(3;)}m=i converges, and 
is thus a Cauchy sequence. Fix a positive number e. Then whenever Mi and 
M2 are sufficiently large, one has 

Under the same conditions, moreover, it follows from the triangle inequality 
in combination with the conclusions of the previous paragraph that 

^wmX^)-^;^ XI Cb^MX^) (^ = 1,2). 



(piW) 



(fe,VK)=l 

Consequently, again by the triangle inequality, one finds that whenever M\ 
and M2 are sufficiently large, one has ||i?vyMi(a;) — B\/i/M-Xx)\2 ^ 3e, and thus 
the sequence {-BvKM(a;)}M=i is Cauchy. Finally, since for 1 ^ z ^ VT, one has 

BwM+i{x) = Bwm{x) + ol(1), 

the sequence {Bm{x)}^^^i is also Cauchy. This confirms our earlier claim, and 
thus the proof of Theorem 11.2! is complete. 

Appendix A. Ergodic theoretic preliminaries 

We take the opportunity here to prepare some of the infrastructure central 
to the ergodic theory employed in the main body of this paper. 

A.l. Measure preserving systems. We begin by recalling that a mea- 
sure preserving transformation on a measure space {Xq^Bx-, ^^x) is a map 
T : Xq Xq satisfying the property that, for all B E B, one has fi{T~^B) = 
fi{B). A probability measure preserving system (m.p.s.) X is a quadruplet 
(Xo, Bx, /ix, T), where the triple (Xq, Bx^ /^x) is a probability measure space, 
and T : Xq — )■ Xq a measure preserving transformation. We define the U' 
spaces L^(X) = L^(Xo, Bx, fJ'x) for 1 ^ p ^ 00 in the usual manner. Thus, in 
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particular, we identify any two functions in U'{X) which agree /x-almost ev- 
erywhere. If Xq is a point, we write X = pt. We will assume throughout this 
paper that the measure preserving system X is regular, which is to say that 
Xq is a compact metric space and Bx consists of all Borel sets in X. There 
is no loss of generality in this assumption since any m.p.s X such that Bx is 
generated by a countable set is equivalent to a regular one. 

A factor map TCy : X Y is a morphism in the category of measure pre- 
serving systems. A factor (Yq, By, /iy, S, Hy) of a system X = (Xq, Bx, /^x, T) 
is a measure preserving system Y = {Yq, By, fJ^Y, S) together with a factor map 
TTy : X — )■ F. In these circumstances, the pushforward {'Ky)*I^x is equal to 
/iy, and the relation tt^ o T = S o Hy holds /xx-almost everywhere. When 
/ : F — C is a measurable map, we write (vTy)*/ : X — )■ C for the pull- 
back defined by (tt^)*/ = / o Hy . Conversely, when / G we denote 
by {'KY)^:f G L'^iY) the pushforward of /. We then define the conditional 
expectation of / to F by 

E(/|F) = (7r?r(vr^)jGL2(X). 

We say that / G L'^{X) is Y -measurable when / = E(/|y), or equivalently, 
when / = {t^y)F for some F G L?'{Y). In circumstances wherein Yq is a 
point, we say that Y is trivial and denote F as pt. Thus, for instance, we may 
write (vr^)*/ = jx f dfix- When there is no ambiguity we write vr for TTy. It 
is convenient when confusion is readily avoided to abuse notation by writing 
X for the system {Xq,Bx, fJ'X,T), or for the measure space {Xq,Bx, fJ'x), or 
simply for the phase space Xq. 

A. 2. Nilsystems and nilsequences. A k-step nilsystem X is a measure pre- 
serving system (Xq, Bx, fJ'X, T), in which Xq = G/T, for some k-step nilpotent 
Lie group G and a cocompact lattice F, and Bx is the Borel cr- algebra, fix the 
Haar measure, and the measure preserving transformation T : G /T ^ G/T is 
given by a rotation by some group element a & G, which is to say that T{gr) = 
agV. A k-step (linear) nilsequence is a sequence of the form {F(a"x)}„gN, 
where x E G/T and F : G/F — )■ M is a continuous function. We endow 
the nilmanifold G/T with a smooth Riemannian metric d. Let g : N ^ G. 
For G N we denote dhg{n) = g{n + h)g~^{n). A function (7 : N — )■ G is 
called a polynomial sequence of degree < k if, for any /zi, . . . , /i^ G N, one has 
dh^ . . .dh^giji) = \g- a degree < k polynomial nilsequence is a sequence of 
the form {F(g{n)x)}n&i^ where x G G /V and F : G/T — > M is a continuous 
function, and g is a. polynomial sequence of degree < k. We say that a nilse- 
quence {F{g{n)x)} has Lipschitz constant L if the function F has Lipschitz 
constant L. In circumstances in which the representation of the nilsequence is 
not explicit, we define the Lipschitz constant by taking the infimum over all 
possible representations. 

We next define the Gowers norms, introduced in Lemma 3.9 of [7j. Let a 
be a function from Z/XZ into C. When k is a. non- negative integer, we define 
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the U''[N]-norm of a to be the quantity ||a.||;7fc[iv] defined via the relation 

I 2^^ 



n,mi,...,mke'l,/NZ ^g|Q^-^|fc 



where a'^ = a when Yli=i^i = (mod 2), and otherwise a'^ = a. Next, we 
define the dual norm to the Gowers norm by means of the relation 



supjl E fin)F{n)\ : 



\uk[N]* = sup<i I jh, jynjr ynji : \\J\\uk[N] ^ 1 

n&[N] 

Finally, an averaged k-step nilsequence with Lipschitz constant M is a func- 
tion F{n) having the form 

F{n) = Ei^««Xi), 

where J is a finite index set, and for each i & I, the expression Fi{a^Xi) is a 
bounded fc-step nilsequence on G/T with Lipschitz constant not exceeding M. 



Appendix B. PET induction 

The notion of PET induction was introduced by Bergelson in |T] as a mech- 
anism for establishing a Polynomial Ergodic Theorem (or PET) for a weakly 
mixing system. We introduce the framework required to apply PET induction 
in this appendix so as to assist in our exposition elsewhere in this paper. 

A polynomial system is a set of polynomials V = {Pi{n), . . . , Pk{n)}, where 
Pi{n) G Z[r2] (1 ^ i ^ fc). The degree of V is the maximum of the degrees of the 
polynomials lying in V. We define an equivalence relation on Z[n] by defining 
the polynomials P and Q to be equivalent when deg(P — Q) < degP. We then 
define the degree of an equivalence class to be the degree of its elements. Any 
polynomial system V can be partitioned into equivalence classes. For each 
positive integer I, let wi be the number of classes of degree / in V. Then the 
weight w(V) of the system V is defined to be the vector {wi, . . . , w^egv)- Next 
we establish an order relation on weight vectors. Given two integer vectors 
V = (fi, . . . ,Vr) and w = {wi, . . . ,Ws), we write v < w if either r < s, or 
else r = s and there is an index n for which vj = wj {n < j ^ r) and 
Vn < Wn- Subject to this relation, the set of weights of polynomial systems is 
well-ordered. The PET induction is an induction on this well-ordered set. 

An ordered system V = {Pi . . . , Pfc} is standard if deg P,- > for 1 ^ i ^ k, 
deg(Pj — Pj) > for i ^ j, and in addition Pi has minimal degree in V. The 
system is linear if each polynomial in V is linear. The following lemma shows 
that standard systems are well-behaved with respect to a natural differencing 
operation. 

Lemma B.l. Let V = {Pi, . . . , Pk} be an ordered polynomial system satisfying 
the property that Pi has minimal degree in V . Given a positive integer m, let 
Q!^ he the system defined by 

Q!^ = {P,(n + m) - Pi(n) : 1 ^ J ^ k], 
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and let be the set of polynomials lying in Q'^ U Qq having degree zero in 
terms of n. Finally, denote by Q = QmiV) the system obtained from the 
set [Q'm U Q'o) \ Q*ra by reordering, if possible, so as to respect the conditions 
described in the preamble. Then 

(i) when V is standard and non-linear, the system Q is standard of weight 
strictly smaller than the weight of V, and 

(ii) when V is linear, the system Q is of weight strictly smaller than the 
weight of V, though possibly non-standard. 

Note that when P is a system of weight (1), then the system Q, associated 
to V by the lemma, is empty. Given a standard polynomial system "P, the 
number of steps of the type described in the lemma required to reach the 
empty system is called the parallelepiped degree of P, denoted by /("P). 

Example B.2. Consider the situation in which V = {n'^,n'^ + n}. Then V is 
standard of weight (0, 1). The system QmiV) associated to V by Lemma [RD is 
the system Vi = {n, 2mn + m^, 2mn-\-n + m'^ -\-m}, which is standard of weight 
(3). A second application of the lemma associates the system V2 = QkiVi) to 
Pi, and this is the system 

{2mn — n + m?, 2mn + m^ + m, 2mn — n + 2km + m? , 2mn + 2mfc + m^ + m + fc}, 

which is non-standard of weight (2). Another application yields the system 
P3 = Qi{V2), namely 

+ m, n + 2mA; + m + fc, + 2ml + m, n + 2mA; + 2ml + m + A;}, 

which is of weight (1). Finally, one last application gives the empty set. Thus 
we may conclude that 1{V) = 4. 
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